{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {
                "colab_type": "text",
                "id": "view-in-github"
            },
            "source": [
                "<a href=\"https://colab.research.google.com/github/AI4Finance-Foundation/ElegantRL/blob/master/tutorial_BipedalWalker_v3.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "c1gUG3OCJ5GS"
            },
            "source": [
                "# **BipedalWalker-v3 Example in ElegantRL**\n",
                "\n",
                "\n",
                "\n",
                "\n"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "FGXyBBvL0dR2"
            },
            "source": [
                "# **Task Description**\n",
                "\n",
                "[BipedalWalker-v3](https://gym.openai.com/envs/BipedalWalker-v2/) is a robotic task in OpenAI Gym since it performs one of the most fundamental skills: moving. In this task, our goal is to get a 2D bipedal walker to walk through rough terrain. BipedalWalker is a difficult task in continuous action space, and there are only a few RL implementations can reach the target reward."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "DbamGVHC3AeW"
            },
            "source": [
                "# **Part 1: Install ElegantRL**"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "metadata": {
                "colab": {
                    "base_uri": "https://localhost:8080/"
                },
                "id": "U35bhkUqOqbS",
                "outputId": "79ace170-9a20-46cd-db96-957fd42a472f"
            },
            "outputs": [],
            "source": [
                "# install elegantrl library\n",
                "!pip install git+https://github.com/AI4Finance-LLC/ElegantRL.git"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "UVdmpnK_3Zcn"
            },
            "source": [
                "# **Part 2: Import Packages**\n",
                "\n",
                "\n",
                "*   **elegantrl**\n",
                "*   **OpenAI Gym**: a toolkit for developing and comparing reinforcement learning algorithms.\n",
                "\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {
                "id": "AAPdjovQrTpE"
            },
            "outputs": [],
            "source": [
                "import gym\n",
                "from elegantrl.agents import AgentPPO\n",
                "from elegantrl.train.config import get_gym_env_args, Config\n",
                "from elegantrl.train.run import *\n",
                "\n",
                "gym.logger.set_level(40) # Block warning"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "z2Ik5cDoyPGU"
            },
            "source": [
                "# **Part 3: Get environment information**"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {
                "colab": {
                    "base_uri": "https://localhost:8080/"
                },
                "id": "wwkZXiHtyV6f",
                "outputId": "880d25f5-d1f0-4cd2-8f78-bb5409330101"
            },
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "{'env_name': 'BipedalWalker-v3',\n",
                            " 'num_envs': 1,\n",
                            " 'max_step': 1600,\n",
                            " 'state_dim': 24,\n",
                            " 'action_dim': 4,\n",
                            " 'if_discrete': False}"
                        ]
                    },
                    "execution_count": 2,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "get_gym_env_args(gym.make(\"BipedalWalker-v3\"), if_print=False)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "3n8zcgcn14uq"
            },
            "source": [
                "# **Part 4: Specify Agent and Environment**\n",
                "\n",
                "*   **agent**: chooses a agent (DRL algorithm) from a set of agents in the [directory](https://github.com/AI4Finance-Foundation/ElegantRL/tree/master/elegantrl/agents).\n",
                "*   **env_func**: the function to create an environment, in this case, we use gym.make to create BipedalWalker-v3.\n",
                "*   **env_args**: the environment information.\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 8,
            "metadata": {
                "id": "E03f6cTeajK4"
            },
            "outputs": [],
            "source": [
                "env_func = gym.make\n",
                "env_args = {\n",
                "    \"env_num\": 1,\n",
                "    \"env_name\": \"BipedalWalker-v3\",\n",
                "    \"max_step\": 1600,\n",
                "    \"state_dim\": 24,\n",
                "    \"action_dim\": 4,\n",
                "    \"if_discrete\": False,\n",
                "    \"target_return\": 300,\n",
                "    \"id\": \"BipedalWalker-v3\",\n",
                "}\n",
                "# env = build_env(env_class=env_func, env_args=env_args)\n",
                "args = Config(AgentPPO, env_class=env_func, env_args=env_args)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "rcFcUkwfzHLE"
            },
            "source": [
                "# **Part 4: Specify hyper-parameters**\n",
                "A list of hyper-parameters is available [here](https://elegantrl.readthedocs.io/en/latest/api/config.html)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 9,
            "metadata": {
                "id": "9WCAcmIfzGyE"
            },
            "outputs": [],
            "source": [
                "args.target_step = args.max_step * 4\n",
                "args.gamma = 0.98\n",
                "args.eval_times = 2**2\n",
                "args.repeat_times = 8"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "z1j5kLHF2dhJ"
            },
            "source": [
                "# **Part 5: Train and Evaluate the Agent**\n",
                "\n",
                "\n",
                "\n",
                "\n"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 10,
            "metadata": {
                "colab": {
                    "base_uri": "https://localhost:8080/"
                },
                "id": "KGOPSD6da23k",
                "outputId": "2a8ed03b-b306-45f8-c530-adf72438c5bd"
            },
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "| Arguments Remove cwd: ./BipedalWalker-v3_PPO_0\n",
                        "| Evaluator:\n",
                        "| `step`: Number of samples, or total training steps, or running times of `env.step()`.\n",
                        "| `time`: Time spent from the start of training to this moment.\n",
                        "| `avgR`: Average value of cumulative rewards, which is the sum of rewards in an episode.\n",
                        "| `stdR`: Standard dev of cumulative rewards, which is the sum of rewards in an episode.\n",
                        "| `avgS`: Average of steps in an episode.\n",
                        "| `objC`: Objective of Critic network. Or call it loss function of critic network.\n",
                        "| `objA`: Objective of Actor network. It is the average Q value of the critic network.\n",
                        "################################################################################\n",
                        "ID     Step    Time |    avgR   stdR   avgS  stdS |    expR   objC   objA   etc.\n"
                    ]
                },
                {
                    "name": "stderr",
                    "output_type": "stream",
                    "text": [
                        "/home/adhi/ElegantRL/.env-erl/lib/python3.10/site-packages/gym/utils/passive_env_checker.py:241: DeprecationWarning: `np.bool8` is a deprecated alias for `np.bool_`.  (Deprecated NumPy 1.24)\n",
                        "  if not isinstance(terminated, (bool, np.bool8)):\n"
                    ]
                },
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "0  2.05e+03       4 | -105.90    6.8    160     5 |   -5.64   1.22   0.06  -0.00\n",
                        "0  2.25e+04      40 | -101.19    0.3    156    32 |   -5.63   0.91   0.06  -0.00\n",
                        "0  4.30e+04      77 | -105.62    0.2    142     5 |   -5.65   1.96   0.06  -0.00\n",
                        "0  6.35e+04     116 | -106.94    0.1     96     2 |   -5.63   0.06   0.07  -0.00\n",
                        "0  8.40e+04     155 |  -76.43    0.8   1600     0 |   -5.69   0.08   0.05  -0.00\n"
                    ]
                },
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "0  1.04e+05     199 |  -72.13    0.1   1600     0 |   -5.62   0.07   0.05  -0.01\n"
                    ]
                }
            ],
            "source": [
                "train_agent(args)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {
                "id": "JPXOxLSqh5cP"
            },
            "source": [
                "Understanding the above results::\n",
                "*   **Step**: the total training steps.\n",
                "*  **MaxR**: the maximum reward.\n",
                "*   **avgR**: the average of the rewards.\n",
                "*   **stdR**: the standard deviation of the rewards.\n",
                "*   **objA**: the objective function value of Actor Network (Policy Network).\n",
                "*   **objC**: the objective function value (Q-value)  of Critic Network (Value Network)."
            ]
        }
    ],
    "metadata": {
        "accelerator": "GPU",
        "colab": {
            "collapsed_sections": [],
            "include_colab_link": true,
            "name": "tutorial_BipedalWalker-v3.ipynb",
            "provenance": []
        },
        "kernelspec": {
            "display_name": "Python 3.9.13 64-bit (microsoft store)",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.10.6"
        },
        "vscode": {
            "interpreter": {
                "hash": "8fec15aaf15af2f7b25d7149644915fb0538c5beb7ab358bd639337cd8050469"
            }
        }
    },
    "nbformat": 4,
    "nbformat_minor": 0
}
